Introduction to the special issue on processing under-resourced languages

نویسندگان

  • Laurent Besacier
  • Etienne Barnard
  • Alexey Karpov
  • Tanja Schultz
چکیده

The creation of language and acoustic resources, for any given spoken language, is typically a costly task. For example, a large amount of time and money is required to properly create annotated speech corpora for automatic speech recognition (ASR), domain-specific text corpora for language modeling (LM), etc. The development of speech technologies (ASR, Text-to-Speech) for the already highresourced languages (such as English, French or Mandarin, for example) is less constrained by this issue and, consequently, high-performance commercial systems are already on the market. On the other hand, for under-resourced languages, the above issue is typically the main obstacle.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Speech Recognition for Under-Resourced Languages:

Speech processing for under-resourced languages is an active field of research, which has experienced significant progress during the past decade. We propose, in this paper, a survey that focuses on automatic speech recognition (ASR) for these languages. The definition of under-resourced languages and the challenges associated to them are first defined. The main part of the paper is a literatur...

متن کامل

An Iterative approach to extract dictionaries from Wikipedia for under-resourced languages

The problem of extracting bilingual dictionaries from Wikipedia is well known and well researched. Given the structural and rich multilingual content of Wikipedia, a language independent approach is necessary for extracting dictionaries for various languages more so for under-resourced languages. In our attempt to mine dictionaries for under-resourced languages, we developed an iterative approa...

متن کامل

Collecting Bilingual Audio in Remote Indigenous Communities

Most of the world’s languages are under-resourced, and most under-resourced languages lack a writing system and literary tradition. As these languages fall out of use, we lose important sources of data that contribute to our understanding of human language. The first, urgent step is to collect and orally translate a large quantity of spoken language. This can be digitally archived and later tra...

متن کامل

Using Resource-Rich Languages to Improve Morphological Analysis of Under-Resourced Languages

The world-wide proliferation of digital communications has created the need for language and speech processing systems for underresourced languages. Developing such systems is challenging if only small data sets are available, and the problem is exacerbated for languages with highly productive morphology. However, many under-resourced languages are spoken in multi-lingual environments together ...

متن کامل

Speech data collection in an under-resourced language within a multilingual context

In this paper, we present an end-to-end solution to the development of an automatic speech recognition (ASR) system in typical under-resourced languages, where the target language is likely to be influenced by one more embedded foreign languages. We first describe the collection and processing of the text corpus crawled from the World Wide Web using the Rapid Language Adaptation Toolkit. In par...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Speech Communication

دوره 56  شماره 

صفحات  -

تاریخ انتشار 2014